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The  resources  needed  to  conventionally  characterize  a  quantum  system  are  overwhelmingly  large  for 
high-dimensional  systems.  This  obstacle  may  be  overcome  by  abandoning  traditional  cornerstones  of 
quantum  measurement,  such  as  general  quantum  states,  strong  projective  measurement,  and  assumption- 
free  characterization.  Following  this  reasoning,  we  demonstrate  an  efficient  technique  for  characterizing 
high-dimensional,  spatial  entanglement  with  one  set  of  measurements.  We  recover  sharp  distributions  with 
local,  random  filtering  of  the  same  ensemble  in  momentum  followed  by  position — something  the 
uncertainty  principle  forbids  for  projective  measurements.  Exploiting  the  expectation  that  entangled  signals 
are  highly  correlated,  we  use  fewer  than  5000  measurements  to  characterize  a  65,536-dimensional  state. 
Finally,  we  use  entropic  inequalities  to  witness  entanglement  without  a  density  matrix.  Our  method 
represents  the  sea  change  unfolding  in  quantum  measurement,  where  methods  influenced  by  the 
information  theory  and  signal-processing  communities  replace  unscalable,  brute-force  techniques — a 
progression  previously  followed  by  classical  sensing. 
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I.  INTRODUCTION 

Practicing  experimentalists  most  commonly  perform 
quantum  measurement  in  the  context  of  state  and  parameter 
estimation  [1].  While  great  historical  emphasis  has  been 
placed  on  using  measurement  to  probe  the  validity  of 
quantum  mechanics  itself — where  measurements  must  not 
only  agree  with  quantum  predictions  but  also  rule  out  any 
competing  explanations  [2] — state  estimation  accepts 
quantum  theory  a  priori .  Here,  measurements  on  identi¬ 
cally  prepared  copies  of  a  system  are  used  to  generate  a 
model  from  which  testable  predictions  can  be  made  about 
future  measurement  statistics  [3].  This  point  of  view  lifts 
the  burden  of  validation,  leading  to  simpler  experiments 
and  technologies. 

Even  so,  quantum-state  estimation  remains  a  persistent 
obstacle  for  scaling  quantum  technologies.  The  familiar 
approach  of  quantum  tomography  (QT)  scales  at  least 
quadratically  poorly  with  added  dimensions  and  exponen¬ 
tially  poorly  with  added  particles.  QT  in  an  A-dimensional 
Hilbert  space  requires  of  order  N 2  measurements  [4] — 
when  A  is  a  prime  power,  N  projections  are  taken  in  each 
of  N  +  1  mutually  unbiased  bases  [5].  For  example, 
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tomography  of  a  single-spin  qubit  (N  =  2)  requires  dividing 
the  ensemble  three  ways,  where  expectation  values  of  the  X, 
F,  and  Z  spin  components  are  separately  measured.  For  most 
nontrivial  quantum  systems,  traditional,  brute-force  QT  is 
unmanageable  in  the  lab.  In  particular,  continuous-variable 
degrees  of  freedom,  such  as  transverse  position  and  trans¬ 
verse  momentum  or  energy  and  time,  where  N  oo,  cannot 
be  realistically  characterized  via  QT  [6]. 

Efforts  to  overcome  the  limitations  of  QT  fall  into  three 
major  categories.  First,  often  only  a  subset  of  a  system’s 
behavior  is  of  interest;  e.g.,  if  one  only  needs  to  predict  a 
qubit’s  spin  along  one  axis,  information  about  the  other  two 
is  irrelevant.  The  general  tomographic  density  matrix  can 
be  discarded  here  in  favor  of  simpler  models  [7].  A 
practical  example  is  quantum  key  distribution  (QKD), 
where  only  two  (instead  of  order  N )  bases,  such  as  energy 
and  time,  need  to  be  characterized  [8].  Many  entanglement 
witnesses  only  require  a  small  subset  of  possible  measure¬ 
ments  to  confirm  entanglement  [9,10]. 

Second,  one  can  leverage  prior  knowledge  about  a 
system.  In  standard  tomography,  maximum  likelihood 
estimation  is  used  to  find  a  valid  density  matrix  consistent 
with  measurement  data  [11,12] — a  simple  assumption  that 
quantum  mechanics  holds.  Or,  given  a  model  of  the 
physical  system,  one  can  begin  with  a  prior  distribution 
which  is  updated  or  parametrized  in  response  to  measure¬ 
ments,  as  in  Bayesian  inference  [13,14]. 

One  powerful  presupposition  is  that  a  signal  is 
structured,  or  compressible.  For  classical  signals,  this 
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surprisingly  broad  assumption  spawned  the  field  of 
compressed  sensing  (CS)  to  tremendous  multidisciplinary 
impact  [15,16]  with  a  strong  presence  in  imaging  [17-20]. 
In  compressed  sensing,  signals  are  compressed  during 
measurement  so  they  can  be  sampled  below  the  Nyquist 
limit  [21].  Several  recent  efforts  apply  CS  to  quantum 
measurement  to  dramatic  effect  [22-26] — in  some  cases, 
reducing  measurement  times  from  years  to  hours  [27].  For 
tomography,  all  protocols  exploiting  positivity  are  a  form  of 
compressed  sensing  [28]. 

Finally,  one  can  choose  measurements  well  suited  to  the 
model  and  prior  knowledge.  There  is  a  compelling  move¬ 
ment  beyond  traditional,  projective  measurements  that 
localize  quantum  particles.  Notably,  there  is  weak  meas¬ 
urement,  where  a  system  and  measurement  device  are  very 
weakly  coupled,  leaving  the  system  nearly  undisturbed 
[29].  With  weak  measurement,  researchers  have  directly 
measured  the  quantum  wave  function  [30],  observed 
average  trajectories  of  particles  in  the  double-slit  experi¬ 
ment  [31],  and  performed  tests  of  local  realism  [32].  More 
recently,  we  investigated  partially  projecting  measurements 
that  lie  somewhere  between  weak  and  projective  measure¬ 
ment.  Using  random,  binary  filtering  in  position  followed 
by  strong  projections  in  momentum,  we  measured  the  sharp 
image  and  diffraction  pattern  of  a  transverse  optical  field 
without  dividing  the  initial  ensemble,  a  feat  impossible  for 
strong,  projective  measurements  [33].  With  nonprojective 
measurement,  the  conventional  wisdom  that  incompatible 
variables  must  be  separately  investigated  is  discarded. 

Guided  by  these  principles,  we  demonstrate  a  novel 
approach  for  efficiently  witnessing  large-dimensional 
entanglement  with  a  single  set  of  measurements.  We  apply 
this  technique  to  Einstein-Podolsky-Rosen  (EPR)  correla¬ 
tions  in  the  spatial  degrees  of  freedom  of  the  biphoton  state 
produced  in  spontaneous  parametric  down-conversion 
(SPDC),  a  system  closely  resembling  the  EPR  gedanke- 
nexperiment  [34,35].  Inspired  by  the  random  measurements 
used  in  CS,  we  show  that  random,  local,  partial  projections 
in  momentum  followed  by  random,  local,  partial  projec¬ 
tions  in  position  can  be  used  to  efficiently  and  accurately 
image  EPR  correlations  in  both  domains.  The  ensemble  is 
not  split — position  and  momentum  measurements  are 
performed  on  the  same  photons.  Remarkably,  the  measure¬ 
ment  disturbance  introduced  by  the  momentum  filtering 
manifests  as  a  small  amount  of  additive  noise  in  the  position 
distribution,  which  remains  unbroadened.  This  allows  the 
position  and  momentum  measurements  to  be  decoupled, 
and  the  joint  probability  distributions  to  be  recovered  in 
a  65,536-dimensional  discretization  of  the  infinite¬ 
dimensional  Hilbert  space.  Our  measurements  do  not  violate 
the  uncertainty  principle;  rather,  they  highlight  the  complex 
and  subtle  behavior  of  measurement  disturbance  given 
nonprojective  measurements. 

Exploiting  our  expectation  that  the  distributions  are 
highly  correlated,  we  use  compressive  sensing  optimization 


techniques  to  dramatically  under-sample — we  need  fewer 
than  5000  measurements  to  obtain  high-quality  distribu¬ 
tions.  By  comparing  the  conditional  Shannon  entropy  in  the 
position  and  momentum  joint  distributions,  we  witness 
high-dimensional  entanglement  and  determine  a  quantum 
secret  key  rate  for  the  joint  system  without  needing  a 
density  matrix. 

II.  THEORY 

A.  Random,  partially  projective  measurements 
of  an  EPR  state 

Consider  a  two-photon  quantum  state  |i j/)  encoded  in  the 
transverse-spatial  degrees  of  freedom  of  the  biphoton 
produced  by  SPDC.  SPDC  is  a  nonlinear-optical  process, 
where  a  high-energy  pump  photon  is  converted  into  two 
lower-energy  daughter  photons,  labeled  signal  and  idler. 
Conservation  of  momentum  dictates  that  the  signal  and 
idler  momenta  be  anticorrelated  for  a  plane- wave  pump. 
Conservation  of  “birthplace,”  the  notion  that  both  photons 
originate  from  the  same  location  in  the  crystal,  dictates 
positive  correlations  in  the  daughters’  transverse  positions. 

Strong  correlations  in  incompatible  observables  are  a 
signature  of  entanglement — in  fact,  the  original  EPR 
paradox  was  described  using  position  and  momentum 
[34].  EPR  considers  the  ideal  state 

W)  =  /  dx\dx2S{x\  —  x2)\xi,x2) 

=  J  dkldk25(kl  +  k2)\kl,k2),  (1) 

perfectly  correlated  in  position  and  perfectly  anticorrelated 
in  momentum.  Although  the  ideal  EPR  state  is  non- 
normalizable  and  consequently  impossible  to  realize  in 
the  lab,  the  biphoton  state  generated  via  SPDC  is  very 
similar  [36,37]. 

EPR  correlations  are  observed  by  measuring  the  joint 
probability  distribution  in  position,  \y/(xi,x2) |2,  and  in 
momentum  \y/(ki,k2)  |2.  Because  these  domains  of  interest 
are  known  in  advance,  only  these  two  distributions  are 
needed — not  a  full  density  matrix.  Spatial  correlations  are 
usually  measured  by  jointly  raster  scanning  single-element, 
photon-counting  detectors  through  either  the  near  field 
(position)  or  far  field  (momentum)  [38].  This  approach 
scales  extremely  poorly  with  increased  single-particle 
dimensionality  n — measurement  time  scales  between  n3 
and  n 4.  For  a  typical  source,  this  could  take  upwards  of  one 
year  for  a  modest  n  =  32  x  32  pixel  resolution  [27]. 

To  avoid  dividing  the  ensemble,  and  to  require  many 
fewer  measurements,  we  instead  apply  local,  partially 
projective  measurements  in  momentum  followed  by  local, 
partially  projective  measurements  in  position,  to  the  same 
photons.  Our  approach  is  illustrated  in  Fig.  1.  The  signal 
and  idler  photons  from  an  EPR-like  state  y/(x1,x2)  are 
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FIG.  1.  Sequential,  partial  projections  in  position  and  momentum.  The  block  diagram  (a)  describes  a  sequence  of  partially  projective 
measurements  on  an  EPR  entangled  source,  (b-d)  Simulated  joint-position  and  joint-momentum  distributions  at  each  point  in  the 
experiment.  Signal  and  idler  photons  from  an  EPR  source  (b)  are  separated  and  allowed  to  propagate  to  the  far  field  (momentum).  Here, 
they  are  subjected  to  random  binary  filtering  by  a  pixelated  mask  (faded  gray  overlay).  Each  pixel  in  the  mask  either  fully  transmits  (T) 
or  fully  rejects  (7 Z).  The  momentum-filtered  fields  (c)  propagate  through  an  optical  system  to  an  image  plane  of  the  source,  where  they 
are  again  filtered  with  random,  binary  filters  (d).  Single-element,  photon-counting  detectors  are  placed  in  the  T  and  1Z  ports  of  each  filter 
and  are  connected  to  a  coincidence  circuit.  The  total  number  of  coincident  detection  events  between  signal  and  idler  channels  gives  a 
random  projection  of  the  momentum  distribution.  The  relative  distribution  of  coincident  detections  between  the  T  and  7 Z  modes  (four 
possibilities)  for  the  signal  and  idler  photons  gives  a  random  projection  of  the  position  distribution  up  to  a  small  noise  floor  injected  by 
the  momentum  filtering. 


separately  allowed  to  propagate  to  the  far  field.  Here,  each 
photon  is  locally  filtered  by  a  random,  binary  mask  (kx ) 

(signal)  or  gf\k2)  (idler),  where  subscript  i  refers  to  a 
particular  pair  of  filters.  Each  local  filter  is  an  n-pixel, 
binary  intensity  mask,  where  individual  pixels  fully  trans¬ 
mit  (T)  or  fully  reject  (1Z)  with  equal  probability.  The 
momentum  filtering  enacts  a  significant  partial  projection 
of  |i j/) — on  average,  half  of  the  local  intensity  and  three- 
quarters  of  the  joint  intensity  is  rejected — so  this  is  not  a 
weak  measurement. 

All  measurements  are  subject  to  uncertainty  relations, 
which  imply  unavoidable  measurement  disturbance. 
Conventional  projective  measurements,  often  associated 
with  “wave-function  collapse,”  localize  a  quantum  state  in 
one  domain  (e.g.,  momentum)  at  the  cost  of  broadening  it 
in  a  conjugate  domain  (e.g.,  position).  Critically,  however, 


random  filtering  does  not  localize  the  quantum  state;  it 
maps  a  small  amount  of  momentum  information  onto  the 
total  intensity  passing  the  filter.  The  measurement  disturb¬ 
ance  of  nonprojective  measurements  is  best  understood  via 
the  entropic  uncertainty  principle 

h(x )  +  h[k )  >  log (ne),  (2) 

where  ft(*)  is  the  Shannon  entropy.  The  entropic  uncer¬ 
tainty  principle  implies  an  information  exclusion  relation; 
the  more  information  a  measurement  gives  about  the 
momentum  distribution,  the  less  information  a  subsequent 
measurement  can  give  about  the  position  distribution  [39]. 
There  are  no  restrictions,  however,  on  how  information 
loss  manifests.  In  particular,  a  measurement  in  one  domain 
need  not  broaden,  or  blur,  the  statistics  in  a  complementary 
domain. 
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The  joint  amplitude  passing  the  momentum  filtering  is 

xj/{k \,k2)  =  ys(kl,k2)f\k\kl)g[k\k2).  To  see  the  effect  of 
the  momentum  filtering  on  the  position  distribution,  we 
take  a  Fourier  transform  to  find  y/(xi ,  x2)  =  T{\j/{ki ,  £2)}* 
which  is  given  by  the  convolution  of  the  state  and  filter 
functions  in  the  position  domain:  y/(x1,x2)  =  y/(x \,x2)+ 

(/•^  (xi ) gf*  (x2) ) •  At  high  resolution,  the  Fourier  transform 
of  an  rc-pixel,  random  binary  pattern  is  approximately 
proportional  to  S(x)  +  y/2 /n(p(x)9  where  values  for  <p(x) 
are  taken  from  a  unit  variance,  complex,  Gaussian  noise 
distribution — a  sharp  central  peak  riding  a  small  noise  floor 
[33]  (see  Ref.  [40]). 

Because  convolution  with  a  delta  function  returns  the 
original  function,  the  perturbed  state’s  position  distribution 
is  the  true  distribution  with  some  weak  additive  noise  terms, 


\v{xx,x2)\2  =  N’\y(xl,x2)*[(S(xl)  +  y/l/N^x  i»(<5(x2) 

+  V/W,-(*  2))]  I2-  (3) 

Expanding  this  product  in  powers  of  1/  y/N,  where  N  =  n2, 
yields 

\y/(xl,x2)\2  =  Af{\v(xux2)\2 

+  2/ NRe[y/*  (xt  ,x2){y/(xi,  x2) 

*(S(x1)(p2(x2)  +  S(x2)(p1(xl)))} 

+  0(l/N)  +  ---  +  0(l/N2)},  (4) 

where  J\f  is  a  normalizing  constant.  Remarkably,  disturb¬ 
ance  from  filtering  adds  only  a  small  noise  floor,  at  most  a 
factor  y/2/N  weaker,  without  otherwise  broadening  the 
position  distribution.  This  can  be  seen  in  Fig.  1(c),  where 
the  position  distribution  maintains  tight  correlations  despite 
the  effect  of  momentum  filtering.  A  rigorous  derivation  of 
Eq.  (4),  including  the  effect  of  finite-width  pixels,  is  given 
in  Ref.  [40]. 

Next,  we  again  perform  random  filtering — this  time  in 
position — as  seen  in  Fig.  1(d).  The  transmitted  and  rejected 
ports  are  directed  to  single-element  “bucket”  detectors  that 
are  not  spatially  resolving.  Photon  detection  events  are  time 
correlated  with  a  coincidence  circuit. 

Each  coincidence  measurement  contains  information 
about  both  position  and  momentum;  these  must  be 
decoupled  to  fit  a  measurement  model, 

=AK  +  &k\ 

yw  =bx  +  &m  +  r«.  (5) 

Here,  K  and  X  are  A-dimensional  signal  vectors  represent¬ 
ing  \y/(ki,k2)\2  and  \y/(x1,x2)\2,  and  A  and  B  are  M  x  A 
sensing  matrices.  Y®  and  YM  are  measurement  vectors 
whose  elements  are  the  inner  product  of  X  or  K  onto  the  ith 


row  (or  sensing  vector)  of  A  or  B.  Noise  vectors  O 
represent  additive  measurement  noise.  Noise  vector  rW 
represents  the  noise  injected  by  momentum  filtering. 

Momentum  information  is  encoded  in  the  total  coinci¬ 
dences  between  all  detection  modes.  Each  row  of  A  is  the 
Kronecker  product  of  two,  random  single-particle  sensing 
vectors  a]1  (g)  a]2  such  that  At  =  a]1  ®a\2,  where,  for 

example,  a encodes  ff\ki). 

Position  information  is  encoded  in  the  relative  distribu¬ 
tion  of  coincidences  between  signal  and  idler  T  and  TZ 
modes.  By  adding  coincidences  between  like  modes 
(TT  and  1Z1Z)  and  subtracting  coincidences  between 
differing  modes  (TTZ  and  1ZT ),  the  effect  of  momentum 
filtering  is  removed  up  to  injected  noise.  Like  momentum, 
the  position-sensing  vector  is  a  Kronecker  product  of 

two  local  sensing  vectors:  Bt  =b\  1J  (g)  b\  2) .  However, 
because  of  the  relative  measurement,  the  local  sensing 
matrices  take  values  “1”  for  transmitting  pixels  and  1” 
for  rejecting  pixels. 

In  our  experiment,  we  use  a  slightly  more  sophisticated, 
but  conceptually  similar,  approach  (see  Ref.  [40])  that 
retains  the  transmission  and  rejection  modes  from  both 
momentum  and  position.  In  this  case,  there  are  16  possible 
correlation  measurements  that  are  combined  to  give  either 
position  or  momentum  information,  and  both  A  and  B  take 
values  “1”  and  “-1.” 

B.  Recovering  the  position  and  momentum 
distributions 

To  obtain  the  joint-position  and  joint-momentum  dis¬ 
tributions  from  our  measurements,  we  turn  to  compressive 
sensing  (CS).  Here,  we  exploit  our  expectation  that 
both  distributions  are  highly  correlated.  Therefore,  the 
distributions  are  sparse  in  their  natural  (position-pixel  or 
momentum-pixel)  representations — relatively  few  elements 
in  each  distribution  have  significant  values.  This  allows 
us  to  dramatically  under-sample  so  that  M  <<£  N.  In  this  case, 
there  are  many  possible  X  and  K  consistent  with  the 
measurements.  CS  posits  that  the  correct  X  and  K  are  the 
sparsest  distributions  consistent  with  the  measurements. 

Sparse  X  and  K  are  found  by  solving  a  pair  of 
optimization  problems 

min  — 1|  yW  -AK\\22  +  TV{K), 

K  2* 

mm^\\YW-BX\\l  +  TV(X),  (6) 

where  ||  *  \\2  is  the  €2  (Euclidean)  norm  and  ju  are 
weighting  constants.  The  first  penalty  is  a  least-squares 
term  that  ensures  the  result  is  consistent  with  measured 
data.  The  second  penalty  7V(*)  is  the  signal’s  total 
variation  (TV),  which  is  the  norm  of  the  discrete 
gradient 
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TV{X)  =  Yj\Xi-Xj\,  (7) 

adj  ij 

where  i,  j  run  over  pairs  of  adjacent  elements  in  the  signal. 
The  TV  regularization  promotes  structured,  sparse  signals 
over  noisy,  uncorrelated  signals.  Total  variation  minimiza¬ 
tion  has  been  extremely  successful  for  compressed  sensing 
and  denoising  in  the  context  of  imaging  [41-43].  In  many 
cases,  a  signal  can  be  recovered  from  M  as  low  as  a  few 
percent  of  N.  For  a  more  complete  introduction  to  com¬ 
pressive  sensing,  see  the  excellent  tutorials  by  Baraniuk 
[44]  and  Candes  and  Wakin  [45]. 

Total  variation  minimization  is  also  extremely  effective 
for  denoising  signals  [46].  Normally,  this  helps  to  mitigates 
environmental  and  photon-counting  shot  noise  (O),  but  in 
our  case,  it  also  largely  removes  the  filtering  measurement 
disturbance  T.  With  strong  measurements,  e.g.,  raster 
scanning  a  pinhole  aperture,  one  requires  deconvolution 
techniques  to  obtain  a  similar  effect.  Not  only  is  deconvo¬ 
lution  far  more  challenging  than  denoising,  it  can  never 
recover  high-frequency  content  beyond  the  aperture  size. 

CS  measurements  are  most  effective  in  a  representation 
that  is  incoherent,  or  maximally  unbiased,  with  respect  to  the 
sparse  representations  (in  our  case,  position  or  momentum). 
Fortunately,  random  projections  perfectly  suit  this  criteria, 
leading  to  the  surprising  conclusion  that  random  measure¬ 
ment  is  actually  preferable.  Random  matrices  are  over¬ 
whelmingly  likely  to  be  restricted  isometries  that  preserve 
the  relative  distance  between  sparse  signals,  ensuring  that 
solving  Eq.  (6)  returns  the  true  signal  instead  of  a  sparse  but 


otherwise  incorrect  result  [47].  Not  only  do  random  filters 
extract  information  in  complementary  domains,  they  are  the 
among  the  best  measurements  for  leveraging  CS. 

One  might  reasonably  ask  if  our  technique  employs 
circular  reasoning — assuming  the  distributions  are  highly 
correlated  in  order  to  then  measure  their  correlations.  This 
is  not  the  case.  The  initial  assumption  is  a  compressibility 
assumption;  relative  to  all  possible  distributions,  our  dis¬ 
tributions  are  expected  to  be  sparse  in  the  natural  pixel 
basis.  We  do  not  know  exactly  how  sparse  the  distributions 
will  be,  or  which  elements  will  be  significant.  However,  the 
vast  majority  of  possible  distributions  are  just  unstructured 
noise — these  are  the  outcomes  we  are  initially  rejecting. 

The  assumption  is  similar  to  assuming  that  a  digital 
photograph  can  be  effectively  compressed  by  the  JPEG 
standard  [48].  A  natural  photographic  scene  contains  more 
low-spatial-frequency  content  than  high- spatial-frequency 
content  and  contains  objects  with  well-defined  edges  and 
recognizable  shapes — regardless  of  the  specific  scene. 

III.  EXPERIMENT 

Our  experimental  setup  is  shown  in  Fig.  2.  An  EPR-like 
state  at  8 1 0  nm  is  generated  by  pumping  a  1  -mm-thick  BiBO 
crystal  oriented  for  type-I  collinear  SPDC  with  a  405-nm 
pump  laser.  The  generated  fields  propagate  to  a  spatial  light 
modulator  (SLM)  in  the  focal  plane  of  a  125-mm  lens. 
Because  the  phase-only  SLM  only  retards  one  polarization, 
it  can  perform  per-pixel  polarization  rotation.  These  polari¬ 
zation  rotations  are  converted  to  intensity  modulations  with 
a  half-wave  plate  and  a  polarizing  beam  splitter.  Random 
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FIG.  2.  Experimental  setup.  A  two-photon,  EPR-like  state  is  generated  by  pumping  a  nonlinear  crystal  for  type-1  SPDC.  Random, 
binary  patterns  placed  on  a  SLM  in  a  Fourier  plane  of  the  crystal  and  on  DMDs  in  an  image  plane  of  the  crystal  implement  a  sequence  of 
random,  partially  projecting  measurements.  Example  patterns  are  shown  next  to  the  SLM  and  DMDs;  note  the  separate  patterns  for 
signal  and  idler  photons  on  the  SLM.  Coincident  detection  events  between  single-photon  detectors  for  signal  and  idler  photons  give 
information  about  both  the  joint-position  and  joint-momentum  distributions  of  the  two-photon  state. 
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masks  that  cause  zero  or  n  polarization  rotations  perform 
the  momentum  filtering.  We  exploit  the  negative  correla¬ 
tions  in  the  momentum  state  to  assign  signal  and  idler 
particles  to  the  left  and  right  halves  of  the  SLM,  respectively. 

The  signal  and  idler  fields  are  routed  to  separate  digital 
micromirror  devices  (DMDs)  via  a  500-mm  lens  and  a 
50/50  beam  splitter;  the  DMDs  are  placed  in  a  crystal 
image  plane  with  4X  magnification.  A  DMD  is  a  two- 
dimensional  array  of  individually  addressable  mirrors,  each 
of  which  can  be  oriented  to  direct  light  towards  or  away 
from  a  detector.  These  correspond  to  the  transmit  and  reject 
ports  in  Fig.  1.  Random  patterns  placed  on  the  DMDs 
implement  the  position  filtering.  The  light  is  coupled  with 
10X  microscope  objectives  into  multimode  fibers  which  are 
connected  to  avalanche  photodiodes  operating  in  geiger 
(photon-counting)  mode.  A  correlator  records  coincident 
detection  events  between  filtered  signal  and  idler  photons. 

Single-particle  sensing  matrices  a^kl\  a^kl\  b^Xl\  and 
are  generated  by  taking  M  rows  from  randomly  permuted 
n  x  n  Hadamard  matrices.  This  allows  the  repeated  calcu¬ 
lations  of  AK  and  BX  performed  by  the  solver  to  use  a  fast 
Hadamard  transform,  decreasing  computational  require¬ 
ments  [49].  Because  we  only  collect  transmitted  modes 
from  both  position  and  momentum  filters,  we  require  16 
separate  measurements  to  collect  all  coincident  combina¬ 
tions  of  transmission  and  rejection  for  the  four  filters 
(described  in  Ref.  [40]).  This  is  not  required,  in  principle, 
if  one  has  eight  detectors.  The  solver  we  use  for  Eq.  (6)  is 
TVAL3  [50].  The  full  measurement  and  reconstruction 
recipe  we  follow  is  similar  to  that  described  in  Ref.  [49]. 

Note  that  our  choice  of  a  single-momentum  SLM  and 
two  position  DMDs  was  due  to  available  equipment.  One 
would  ideally  use  four  SLMs  to  implement  completely 
separate  position  and  momentum  filtering  for  both  the 


signal  and  idler  fields.  The  SLM  is  preferred  for  filtering 
because  of  its  high  (>  90%)  diffraction  efficiency  in 
contrast  to  the  lower  (#20%)  diffraction  efficiency  for 
the  DMDs. 

IV.  RESULTS 
A.  Signal  recovery 

Sample  recovered  joint  signals  for  position  and  momen¬ 
tum  are  given  in  Fig.  3  as  returned  directly  by  the  solver. 
The  single-particle  resolution  was  n  =  16  x  16  pixels, 
so  the  joint  signal  has  dimensionality  N  =  n2  =  65,  536. 
For  the  sample  image,  M  =  4439  random  projections  were 
used  corresponding  to  M  less  than  0.07V.  Positive  corre¬ 
lations  in  position  and  negative  correlations  in  momentum 
between  signal  and  idler  particles  are  clearly  seen.  The  gaps 
visible  on  the  diagonal  are  an  artifact  of  row- wise  reshaping 
to  one  dimension — these  regions  are  physically  outside  the 
marginal  beam  width. 

B.  Reconstruction  noise 

Unfortunately,  the  images  shown  in  Fig.  3  do  not 
represent  valid  probability  distributions  due  to  the  presence 
of  weak,  zero-mean,  additive  noise  shown  in  Fig.  4.  Note 
that  solving  the  objective  function,  Eq.  (6),  does  not  strictly 
recover  a  valid  probability  distribution  as  it  allows  negative 
values.  We  found  that  current,  established  solvers  such  as 
TVAL3  performed  better  without  such  additional  con¬ 
straints — improved,  quantum-specific  solvers  are  a  topic 
of  future  research. 

Figure  4(a)  shows  slices  of  the  joint-position 
reconstruction  along  the  signal  axis,  where  each  curve 
corresponds  to  a  particular  idler  pixel.  Zooming  in  on  a 
region  with  no  signal  in  Fig.  4(b),  we  observe  the  noise. 
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FIG.  3.  Representative  recovered  joint-distributions  in  position  and  in  momentum  for  a  16  x  16  pixel  (. N  =  256  x  256)  discretization. 
Only  M  =  4439  measurements  were  needed,  about  0.07V.  Gaps  along  the  position  diagonal  occur  because  of  reshaping  to  one 
dimension — these  regions  were  outside  the  marginal  width.  Position  and  momentum  units  refer  to  the  transverse  plane  at  the  nonlinear 
crystal  (z  =  0). 
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FIG.  4.  Reconstruction  noise,  (a)  One-dimensional  slices  along  the  signal  axis  of  the  joint-position  reconstruction  from  Fig.  3  reveal 
the  presence  of  zero-mean,  additive  Gaussian  noise.  The  presence  of  negative  values  strongly  suggests  that  this  noise’s  form  is 
nonphysical;  the  reconstruction  process  maps  measurement  uncertainty  into  this  noise.  A  close-up  of  a  noise-only  region  (signal  pixels  5 
to  30,  all  idler  pixel  spectra)  is  shown  in  (b).  A  histogram  of  outcomes  (c)  for  the  region  shown  in  (b)  demonstrates  that  the  noise  follows 
Gaussian  statistics  with  zero  mean  and  standard  deviation  0.014.  To  obtain  a  valid  probability  distribution,  values  below  a  chosen 
threshold  can  be  set  to  zero  and  the  distribution  normalized. 


This  noise  contains  both  measurement  uncertainty  and 
solver  artifacts.  Potential  noise  sources  include  shot-noise, 
long-term  drift  in  the  pump  laser,  stray  light,  and  crystal 
temperature  instability.  Figure  4(c)  gives  a  histogram  of  the 
noise  shown  in  Fig.  4(b),  which  follows  Gaussian  statistics. 
An  appropriate  model  for  signals  returned  by  the  solver  is 
therefore 


ATW  =  X  +  GW, 

(8) 

JfW  =  K  +  G«, 

(9) 

where  X M  and  K M  refer  to  the  signals  returned  by  the  solver 
and  G M  and  G®  are  additive,  zero-mean  Gaussian  noise. 

The  simplest  way  to  obtain  valid  probability  distributions 
is  to  threshold  values  below  a  small  percentage  of  the 
maximum  value  to  zero.  As  seen  in  Fig.  4(b),  any  threshold 
below  5%  removes  the  uniform  noise  floor  without 
removing  any  signal  peaks.  This  approach  is  similar  to 
the  common  technique  of  subtracting  dark  counts  from  data 
in  coincidence  measurements  and  other  noise-suppression 
techniques. 

C.  Witnessing  entanglement 

To  witness  and  quantify  entanglement,  we  violate  an 
entropic  steering  inequality  [51-53]  (see  Ref.  [40]);  all 
classically  correlated  states  satisfy 

(UC  \ 

aaJ’  (10) 

where  H{Xx  \X2)  and  H(Ki\K2 )  are  the  conditional,  dis¬ 
crete  Shannon  entropies  of  the  respective  position  and 
momentum  joint  distributions.  Here,  (Ax)  is  the  width  in 
momentum  (position)  sampled  by  a  single-pattern  pixel  on 
the  SLM  (DMD)  in  the  transverse  plane  of  the  nonlinear 
crystal.  For  position  Ax,  this  is  found  by  dividing  the 
physical  width  of  a  pattern  pixel  on  the  DMD  by  the 


magnification  of  the  imaging  system.  For  momentum, 
the  physical  width  of  a  SLM  pattern  pixel  pk  is  related 
to  Ak  via  the  Fourier-transforming  property  of  a  lens,  so 
Ak  =  pk2n/  (2/),  where  A  is  the  wavelength  of  light  and  / 
is  the  lens  focal  length. 

The  entropic  steering  inequality  is  powerful  because  it  is 
computed  directly  from  measured  probability  distributions 
and  does  not  require  a  density  matrix.  Remarkably,  despite 
being  a  function  of  discrete  distributions,  it  witnesses 
continuous-variable  entanglement.  Moreover,  the  amount 
the  inequality  is  violated  corresponds  to  a  secret  key  rate  for 
quantum  key  distribution  [37,54]. 

The  conditional  entropies  in  position  and  momentum  for 
our  experimental  results  are  given  in  Fig.  5  as  a  function  of 
measurement  number.  Different  curves  correspond  to 
increased  levels  of  thresholding,  setting  values  below  a 
percentage  of  the  maximum  value  to  0.  A  sharp  transition 
from  poor  reconstruction  to  good  reconstruction  is  clearly 
demonstrated  by  dramatic  drops  in  the  conditional  entro¬ 
pies  around  M  =  2000.  This  transition  is  characteristic  of 
compressed  sensing  as  the  number  of  measurements 
becomes  sufficient  to  accurately  reconstruct  the  signal 
[55] — strongly  suggesting  we  made  enough  measurements. 
For  too-small  M,  reconstructions  fail  spectacularly  and 
return  unstructured  noise.  For  a  ^-sparse  signal  ( k  out  of  N 
elements  have  significant  intensity),  the  required  number  of 
measurements  scales  as  cklog(N /&),  where  c  is  a  near¬ 
unity  constant  [21].  For  M  beyond  the  transition,  one  is 
sampling  above  the  information  rate.  Traditionally,  one  is 
concerned  with  sampling  at  or  beyond  the  Nyquist  rate, 
where  M  =  N. 

In  momentum,  the  conditional  entropy  drops  to  nearly 
zero;  in  position,  it  drops  to  less  than  2  bits.  The  position 
entropy  likely  levels  off  because  of  slight  pixel  misalign¬ 
ment  between  the  two-position  DMDs.  Physically,  this 
indicates  that  a  particular  signal  position  pixel  is  correlated 
to  about  four  idler  pixels,  whereas  a  particular  signal- 
momentum  pixel  is  only  correlated  to  one  idler  pixel. 
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FIG.  5.  Conditional  entropy  versus  measurement  number.  A  sharp  transition  from  high  to  low  conditional  entropy  is  seen  as  the 
number  of  measurements  increases.  Note  that  N  =  2562,  so  M  =  2,  000  is  only  0.03A.  Different  curves  correspond  to  different  levels  of 
thresholding  to  remove  the  noise  floor.  Bold  lines  indicate  an  average  over  nine  trials.  Faded  lines  enclose  up  to  4  standard  deviations 
about  the  mean.  When  the  conditional  entropy  sum  is  below  the  bound,  the  state  is  entangled. 


The  steering  inequality  is  violated  with  as  little  as 
2%  thresholding,  and  by  over  6  bits  for  thresholding 
beyond  7%. 

The  effect  of  thresholding  for  M  =  5000  is  given  in 
Fig.  6.  Figure  6(a)  shows  the  conditional  entropies  for 
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FIG.  6.  Effect  of  thresholding.  The  effect  of  thresholding  to 
remove  weak  background  noise  on  the  conditional  entropy 
(a)  and  mutual  information  (b)  is  given.  The  bold  line  gives 
the  average  for  nine  trials;  faded  lines  give  the  results  from  the 
individual  trials.  M  =  4439  measurements  were  used.  When  the 
conditional  entropy  sum  is  below  the  bound,  the  state  is 
entangled. 


position,  momentum,  and  their  sum  with  the  corresponding 
entanglement  bound.  Figure  6(b)  gives  the  mutual  infor¬ 
mation  I{Xx  :X2)  and  I(K1  :K2),  where,  for  example, 

I(X J  :X2)  =  H(Xt)  +  H(X2)  -  H(XuX2).  (11) 

Here,  H{Xx,X2)  is  the  Shannon  entropy  of  the  joint 
distribution,  and  H(Xx)  and  H{X2)  are  Shannon  entropies 
of  the  marginal,  single-particle  distributions.  From  infor¬ 
mation  theory,  this  mutual  information  provides  a  maxi¬ 
mum  bit  rate  for  communication  with  joint-position  or 
joint-momentum  representations  for  this  system  [56].  The 
mutual  information  arises  as  a  function  of  thresholding, 
indicating  that  thresholding  is  not  trivially  decreasing  the 
conditional  entropies  and  that  the  most  likely  joint  out¬ 
comes  are  the  most  highly  correlated.  Again,  the  momen¬ 
tum  mutual  information  is  larger  because  of  slight  optical 
misalignments  for  position  DMDs. 

An  important  point  is  that  the  thresholded  signal  peaks 
still  retain  the  additive  Gaussian  noise  from  the 
reconstruction  process.  Because  of  the  data-processing 
inequality  [56],  this  noise  cannot  decrease  the  conditional 
entropy  and  cannot  increase  the  mutual  information  (this 
would  be  like  arguing  that  a  noisy  channel  is  better  for 
communication  than  its  noiseless  counterpart).  Therefore, 
we  conservatively  underestimate  our  ability  to  violate  the 
steering  witness  [see  Eq.  (10)]. 

V.  CONCLUSION 

We  have  demonstrated  that  local,  random  filtering 
in  momentum  followed  by  local,  random  filtering  in 
position — of  the  same  photons — can  recover  sharp,  joint 
distributions  for  both  observables.  This  is  not  possible  with 
standard,  projective  measurements  that  localize  photons  in 
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either  position  or  momentum.  Using  the  expectation  that 
the  signals  will  be  highly  correlated  allows  us  to  use  many 
fewer  measurements  than  dimensions  in  the  system  via 
techniques  of  compressed  sensing.  We  strongly  emphasize 
that  we  have  not  violated  any  uncertainty  relations;  instead, 
we  have  chosen  nonprojective  measurements  whose  dis¬ 
turbance  can  easily  be  mitigated. 
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